A Parallel Expressed Sequence Tag (EST) Clustering Program
نویسندگان
چکیده
This paper describes the UIcluster software tool, which partitions Expressed Sequence Tag (EST) sequences and other genetic sequences into “clusters” based on sequence similarity. Ideally, each cluster will contain sequences that all represent the same gene. If a näıve approach such as anNxN comparison (N is the number of sequences input) is taken, the problem is only feasible for very small data sets. UIcluster has been developed over the course of four years to solve this problem efficiently and accurately for large data sets consisting of tens or hundreds of thousands of EST sequences. The latest version of the application has been parallelized using the MPI (message passing interface) standard. Both the computation and memory requirements of the program can be distributed among multiple (possibly distributed) UNIX processes.
منابع مشابه
Massively parallel expressed sequence tag clustering
Expressed Sequence Tag (EST) sequencing is a highly efficient technique that samples expressed genes required for most cellular functions. While this is a well-studied problem and many software tools have been developed, large-scale EST clustering has previously been pursued through incremental approaches, a pipeline of programs and manual efforts to achieve a modest degree of parallelism. Here...
متن کاملSEAN: SNP prediction and display program utilizing EST sequence clusters
SEAN is an application that predicts single nucleotide polymorphisms (SNPs) using multiple sequence alignments produced from expressed sequence tag (EST) clusters. The algorithm uses rules of sequence identity and SNP abundance to determine the quality of the prediction. A Java viewer is provided to display the EST alignments and predicted SNPs.
متن کاملEvaluating the Significance of Global and Local Features in Expressed Sequence Tag: A Clustering Quality Perspective
Clustering of expressed sequence tag (EST) plays an important role in gene analysis. Alignment-based sequence comparison is commonly used to measure the similarity between sequences, and recently some of the alignment-free comparisons have been introduced. In this paper, we evaluate the role of global and local features extracted from the alignment free approaches i.e., compression-based method...
متن کاملEfficient clustering of large EST data sets on parallel computers.
Clustering expressed sequence tags (ESTs) is a powerful strategy for gene identification, gene expression studies and identifying important genetic variations such as single nucleotide polymorphisms. To enable fast clustering of large-scale EST data, we developed PaCE (for Parallel Clustering of ESTs), a software program for EST clustering on parallel computers. In this paper, we report on the ...
متن کاملGenetic Diversity and Population Structure of Iranian tulips revealed by EST-SSR and NBS-LRR Markers
The genus Tulipa L. (Liliaceae) comprises about 100 species and Iran is considered as one of the main origins of tulips. In this research, genetic diversity and population structure of 27 wild populations of tulips collected from Iran were studied by 15 highly polymorphic and reproducible expressed sequenced tag-simple sequence repeat (EST-SSR) markers and 8 nucleotide binding site (NBS)-enzyme...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001